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1. Introduction 



K. Latuszynski et al. 



Let TT be a probability distribution on a Polish space X and / : A" — ^ M be a Borel 
function. The objective is to compute (estimate) the quantity 



Typically A" is a high dimensional space, / need not be bounded and the density of tt is 
known up to a normalizing constant. Such problems arise in Bayesian inference and are 
often solved using Markov chain Monte Carlo (MCMC) methods. The idea is to simulate 
a Markov chain (X„) with transition kernel P such that ttP = tt, that is tt is stationary 
with respect to P. Then averages along the trajectory of the chain, 



are used to estimate 6. It is essential to have explicit and reliable bounds which provide 
information about how long the algorithms must be run to achieve a prescribed level of 
accuracy (c.f. [ .^9-5a, JHOl, .THCN()(i]). The aim of our paper is to derive non-asymptotic 
and explicit bounds on the mean square error. 



To upper bound (1.1), we begin with a general inequality valid for all ergodic Markov 
chains that admit a one step small set condition. Our bound is sharp in the sense that 
the leading term is exactly (t'^^{P, f)/n, where a^^{P, /) is the asymptotic variance in the 
central limit theorem. The proof relies on the regeneration technique, methods of renewal 
theory and statistical sequential analysis. 

To obtain explicit bounds we subsequently consider geometrically and polynomially 
ergodic Markov chains. We assume appropriate drift conditions that give quantitative 
information about the transition kernel P. The upper bounds on MSE are then stated in 
terms of the drift parameters. 

We note that most MCMC algorithms implemented in Bayesian inference are geomet- 
rically or polynomially ergodic (however establishing the quantitative drift conditions 
we utilize may be prohibitively difhcult for complicated models). Uniform ergodicity is 
stronger then geometrical ergodicity considered here and is often discussed in literature. 
However few MCMC algorithms used in practice are uniformly ergodic. MSE and confi- 
dence estimation for uniformly ergodic chains are discussed in our accompanying paper 



The Subgeometric condition, considered in e.g. [DGM08], is more general than poly- 
nomial ergodicity considered here. We note that with some additional effort, the results 
for polynomially ergodic chains (Section 5) can be reformulated for subgeometric Markov 
chains. Motivated by applications, we avoid these technical difficulties. 





(1.1) 



MSE E{9n - of. 



[LMNll]. 
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Upper bounding the mean square error (1.1) leads immediately to confidence estima- 
tion by applying the Chebyshev inequality. One can also apply the more sophisticated 
median trick of [JVV8G], further developed in [NP09]. The median trick leads to an ex- 
ponential inequality for the MCMC estimate whenever the MSE can be upper bounded, 
in particular in the setting of geometrically and polynomially ergodic chains. 

We illustrate our results with benchmark examples. The first, which is related to a 
simplified hierarchical Bayesian model and similar to [JHOl, Example 2], allows to com- 
pare the bounds provided in our paper with actual MCMC errors. Next, we demonstrate 
how to apply our results in the Poisson-Gamma model of [GS90]. Finally, the contracting 
normals toy-example allows for a numerical comparison with our earlier work [LNll]. 

The paper is organised as follows: in Section 2 we give background on the regeneration 
technique and introduce notation. The general MSE upper bound is derived in Section 3. 
Geometrically and polynomially ergodic Markov chains are considered in Sections 4 and 5 
respectively. The applicability of our results is discussed in Section 6, where also numerical 
examples are presented. Technical proofs are deferred to Sections 7 and 8. 

1.1. Related nonasymptotic results 

A vast literature on nonasymptotic analysis of Markov chains is available in various 
settings. To place our results in this context we give a brief account. 

In the case of finite state space, an approach based on the spectral decomposition was 
used in [Ald87, Gil98, LP04, NP()<)] to derive resuhs of related type. 

For hounded functionals of uniformly ergodic chains on a general state space, expo- 
nential inequalities with explicit constants such as those in [GO02, KLMM05] can be 
applied to derive confidence bounds. In the accompanying paper [LMNll] we compare 
the simulation cost of confidence estimation based on our approach (MSE bounds with 
the median trick) to exponential inequalities and conclude that while exponential inequal- 
ities have sharper constants, our approach gives in this setting the optimal dependence 
on the regeneration rate /3 and therefore will turn out more efficient in many practical 
examples. 

Related results come also from studying concentration of measure phenomenon for 
dependent random variables. For the large body of work in this area see e.g. [Mar96], 
[Sam(J(j] and [KR08] (and references therein), where transportation inequalities or mar- 
tingale approach have been used. These results, motivated in a more general setting, 
are valid for Lipschitz functions with respect to the Hamming metric. They also include 
expressions svi^i^y^x WP^ip^i') ~ P^{y-,')\\tv and when applied to our setting, they are 
well suited for bounded functionals of uniformly ergodic Markov chains, but can not be 
applied to geometrically ergodic chains. For details we refer to the original papers and 
the discussion in Section 3.5 of [A(la()8]. 

For lazy reversible Markov chains, nonasymptotic mean square error bounds have been 
obtained for bounded target functions in [Rud09] in a setting where explicit bounds on 
conductance are available. These results have been applied to approximating integrals 
over balls in under some regularity conditions for the stationary measure, see [Rufl09] 
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for details. The Markov chains considered there are in fact uniformly ergodic, however 
in their setting the regeneration rate (3, can be verified for P'', h > 1 rather then for P 
and turns out to be exponentially small in dimension. Hence conductance seems to be 
the natural approach to make the problem tractable in high dimensions. 

Tail inequalities for bounded functionals of Markov chains that are not uniformly er- 
godic were considered in [ClcOl], [Ada,08] and [DGM08] using regeneration techniques. 
These results apply e.g. to geometrically or subgeometrically ergodic Markov chains, how- 
ever they also involve non-explicit constants or require tractability of moment conditions 
of random tours between regenerations. Computing explicit bounds from these results 
may be possible with additional work, but we do not pursue it here. 

Nonasymptotic analysis of unbounded functionals of Markov chains is scarce. In partic- 
ular tail inequalities for unbounded target function / that can be applied to geometrically 
ergodic Markov chains have been established by Bertail and Clemengon in [BCIO] by re- 
generative approach and using truncation arguments. However they involve non-explicit 
constants and can not be directly applied to confidence estimation. Nonasymptotic and 
explicit MSB bounds for geometrically ergodic MCMC samplers have been obtained in 
[LNll] under a geometric drift condition by exploiting computable convergence rates. 
Our present paper improves these results in a fundamental way. Firstly, the generic 
Theorem 3.1 allows to extend the approach to different classes of Markov chains, e.g. 
polynomially ergodic in Section 5. Secondly, rather then resting on computable conver- 
gence rates, the present approach relies on upper-bounding the CLT asymptotic variance 
which, somewhat surprisingly, appears to be more accurate and consequently the MSE 
bound is much sharper, as demonstrated by numerical examples in Section 6. 

Recent work [.lOlO] address error estimates for MCMC algorithms under positive cur- 
vature condition. The positive curvature implies geometric ergodicity in the Wasserstein 
distance and bivariate drift conditions (c.f. [RROl]). Their approach appears to be ap- 
plicable in different settings to ours and also rests on different notions, e.g. employs the 
coarse diffusion constant instead of the exact asymptotic variance. Moreover, the tar- 
get function / is assumed to be Lipschitz which is problematic in Bayesian inference. 
Therefore our results and [JO 10] appear to be complementary. 

Nonasymptotic rates of convergence of geometrically, polynomially and subgeomet- 
rically ergodic Markov chains to their stationary distributions have been investigated 
in many papers [MT94, Ros95b, RT99, Ros02, JH04, For03, DMR04, Bax05, FMOSb, 
DMS07, RRll] under assumptions similar to our Section 4 and 5, together with an ape- 
riodicity condition that is not needed for our purposes. Such results, although of utmost 
theoretical importance, do not directly translate into bounds on accuracy of estimation, 
as they allow to control only the bias of estimates and the so-called burn-in time. 

2. Regeneration Construction and Notation 

Assume P has invariant distribution tt on A", is 7r-irreducible and Harris recurrent. The 
following one step small set Assumption 2.1 is verifiable for virtually all Markov chains 
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targeting Bayesian posterior distributions. It allows for the regeneration/split construc- 
tion of Nummelin [Num78] and Athreya and Ney [AN78]. 

2.1 Assumption (Small Set). There exist a Borel set J Q X of positive tt measure, a 
number /3 > and a probability measure v such that 

P{x,-) > I3lix e J)v{-). 

Under Assumption 2.1 we can define a bivariate Markov chain {X^, r„) on the space 
X X {0, 1} in the following way. Bell variable r„_i depends only on Xn~i via 

(2.2) P(r„_i = l|X„_i =a;) =/3I(a;e J). 

The rule of transition from (A"„_i,r„_i) to A„ is given by 

¥{Xn e A|r„_i = l,Xn-l =X)= I^{A), 

P(x„ e A|r„_i = o,Xn-i ^x)^ Q{x,A), 

where Q is the normalized "residual" kernel given by 



Whenever r„_i = 1, the chain regenerates at moment n. The regeneration epochs are 

T := Ti min{n > 1 : r„_i = 1}, 
Tfe := min{n > Tk-i : r„_i = 1}. 

Write Tfe := Tk — Tk-i for /c = 2, 3, . . . and ri := T. Random blocks 

S := ^1 := (Xo, . . . , Xt~i, T) 
Sfc := (A:Tfe_i, . . . ,XTfc-i,Tfc) 

for fc = 1, 2, 3, . . . are independent. 

We note that numbering of the bell variables r„ may differ between authors: in our 
notation r^-i ~ 1 indicates regeneration at moment n, not n—1. Let symbols and 
mean that Xq ^ ^. Note also that these symbols are unambiguous, because specifying the 
distribution of Xq is equivalent to specifying the joint distribution of {Xq,Tq) via (2.2). 

For fc = 2,3, . . ., every block under Pj has the same distribution as S under P^. 
However, the distribution of S under P^ is in general different. We will also use the 
following notations for the block sums: 



Q{x,-) : 



P{x,-)~'Pl{x £ J)v{-) 
1 - I3l{x e J) 



T-l 
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3. A General Inequality for the MSE 

We assume that - ^ and thus X„ - fP". Write / := / - 7r(/). 
3.1 Theorem. If Assumption 2.1 holds then 



(3.2) (0„ - er < "^'^^ [ 1 

where 

(3.3) aL(F,/) := ^^^^0^ 

(3.4) Co(P) := E.T-i 



(3.5) Ci(P,/) V%(S(I/I)) 



(3.6) C2iPJ) = C2{PJ,n) 



\ 



EJl(ri<n) ^ |/|(X, 



(3.7) R{n) ~ mm{r > 1 : > n}. 

3.8 REMARK. The bound in Theorem 3.1 is meaningful only if al^{P, f) < oo, 
Co{P) < oo, Ci(P, /) < oo and C2(P, /) < oo. Under Assumption 2.1 we always have 
E^r < oo but not necessarily E^T^ < oo. On the other hand, finiteness of E^(S(/))^ 
is a sufficient and necessary condition for the CLT to hold for Markov chain X„ and 
function /. This fact is proved in [BLL08] in a more general setting. For our purposes 
it is important to note that cr^s(P, /) in Theorem 3.1 is indeed the asymptotic variance 
which appears in the CLT, that is 



(e^-e) ^,N(o,fTL(P,/)). 



Moreover, 



hm nE^ (§r,-0y = <7l{PJ). 

In this sense the leading term (Ji^siP, f) / Vn in Theorem 3.1 is "asymptotically correct" 
and cannot be improved. 

3.9 REMARK. Under additional assumptions of geometric and polynomial ergod- 
icity, in Sections 4 and 5 respectively, we will derive bounds for i7^s(P, /) and Co(P), 
C'i(P, /), C'2(P, /) in terms of some explicitly computable quantities. 
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3.10 REMARK. In our related work [LMNll], we discuss a special case of the setting 
considered here, namely when regeneration times Tk are identifiable. These leads to 
Xq ~ u and an regenerative estimator of the form 



(3.11) 



The estimator ^t^j^, is somewhat easier to analyze. We refer to [LMNll] for details. 
Proof of Theorem 3.1. Recall R{n) defined in (3.7) and let 

A{n) := rR(„) - n. 

In words: R{n) is the first moment of regeneration past n and A(n) is the overshoot or 
excess over n. Let us express the estimation error as follows. 

n-l /Tr(„)-1 Ti-1 rR(„)-l 



i=Ti 



1 



(Z+O1-O2) 



with the convention that = whenever / > u. The triangle inequality entails 



(3.12) 

Denote C(P, /) := ^/E^{Oi - O2Y and compute 



C(P,/) 



i=0 



( E/(^^)- E /(^Oji(r>n) 

i—n 

+fE/(^^)- E /(^o)i(T< 

^ z— i—n 

< \E^\j2\f{x,)\+ f; i/(xoii(T< 



2\ 2 



i=0 



< 



(3.13) 



Ci{PJ) + C2{P,f). 



, 1=0 



\ 



eJ ^ |/(X,)|I(r<n) 
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It remains to bound the middle term, which clearly corresponds to the most 

significant portion of the estimation error. The crucial step in our proof is to show the 
following inequality: 



(3.14) ^"-(^ E ^ cTL(P,/)(n + 2Co(P)). 

Once this is proved, it is easy to see that 



EjZ^ = (z2|Ti = j) P4(ri = j) = ^E J /(^') I ^^Ti = j) 

j=l j=l \ i=0 

n 

< ^aL(P,/)(n-j + 2Co(P))Pe(Ti=j-) < <jl{PJ){n + 2Co{P)), 
i=i 

concequently y/W^Z^ < -y/ncras(f , /)(! + Co{P)/n) and the conclusion will follow by 
recalling (3.12) and (3.13). 

We are therefore left with the task of proving (3.14). This is essentially a statement 
about sums of i.i.d. random variables. Indeed, 

R(n) 

(3.15) to) = E^'»(/) 

i=0 k=l 

and all the blocks 5^ (including S = Si) are i.i.d. under Pi,. By the general version of 
the Kac Theorem ([MT93] Thm 10.0.1 or [Num02] equation (3.3.7)) we have 

E,S(/) - 7r(/)E,T, 

(and 1/Ei,r = /37r( J)), so E,,S(/) = and Var„S(/) = al^{P, f)E^T. Now we wiU_exploit 
the fact that R{n) is a stopping time with respect to Gk — <^i{'^i{f)iTi)j • • • i (■='/c(/)j ^fe)), 
a filtration generated by i.i.d. pairs. We are in a position to apply the two Wald's 
identities. The second identity yields 

E, i J2 1 = Var,S(/)E,i?(n) - al{PJ)E,TE,R{n). 

But in this expression we can replace Ei,TE,yR{n) by E^T^jj^n-^ because of the first Wald's 
identity: 

E,Tfl(„) = E, E Tfe = E,TE,R{n). 

fe=i 
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It follows that 

/R(n) 

(3.16) ^^(/) I = ^as(^, m.T^in) = ctUP, /) (n + E,A(n)) . 

y fc=i 

We now focus attention on bounding the "mean overshoot" E^A(n). Under P^, the 
cumulative sums T = Ti < T2 < . . . < Tk < . . . form a (non-delayed) renewal process in 
discrete time. Let us invoke the following elegant theorem of Lorden ([LorTO], Thm 1): 

(3.17) E.A(.) < 

By Lemma 7.3 with g = 1 from section 7 we obtain: 

(3.18) E^A(n) < 2E^r- 1 

Hence substituting (3.18) into (3.16) and taking into account (3.15) we obtain (3.14) and 
complete the proof. □ 



4. Geometrically Ergodic Chains 

In this section we upper bound constants cr^s(-P, /), Co(P), Ci(P, /), C2(P, /), appear- 
ing in Theorem 3.1, for geometrically ergodic Markov chains under a quantitative drift 
assumption. Proofs are deferred to Sections 7 and 8. 

Using drift conditions is a standard approach for establishing geometric ergodicity. We 
refer to [RR04] or [MT!).3] for definitions and further details. The assumption below is the 
same as in [BaxOT)]. Specifically, let J be the small set which appears in Assumption 2.1. 

4.1 Assumption (Geometric Drift). There exist a function V : X [l,oo[, constants 
A < 1 and K < 00 such that 



PV{x) / P{x,Ay)V{y) < 



X 



\V{x) for X ^ J, 
K for X £ J, 



In many papers conditions similar to Assumption 4.1 have been established for realistic 
MCMC algorithms in statistical models of practical relevance [HG98, FMOO, FMRR03, 
JH04, JJIO, RHIU]. This opens the possibility of computing nonasymptotic upper bounds 
on MSE or nonasymptotic confidence intervals in these models. 

In this Section we bound quantities appearing in Theorem 3.1 by expressions involving 
A, /3 and K. The main result in this section is the following theorem. 

4.2 Theorem. If Assumptions 2.1 and 4-1 hold and f is such that 

11/11 1 :=sup|/»|/V^5(x)<cx), 

X 
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then 

(.) c„,P, < + + 



11/11^1 I-A2 /3(1-A2) 



^ ^ ll/IP. - (1-A^)2^^^+ ^(1-A^)^ ^ 



■K2 



I3{K - A - /3) + 2(if 3 - A3 - 

/32(l - A3)2 ■ 



(iw) C2{Pjf satisfi es an inequality analogous to (Hi) with ^ replaced by fP". 

4.3 REMARK. Combining Theorem 4.2 with Theorem 3.1 yields the MSE bound 
of interest. Note that the leading term is of order n~^/3~^(l — A)~^. A related result is 
Proposition 2 of [FM03a] where the p— th moment of 9n for p > 2 is controlled under 
similar assumptions. Specialised to p = 2 the leading term of the moment bound of 
[FM03a] is of order ^-^^-^(l - A)-^. 

4-4 REMARK. An alternative form of the first bound in Theorem 4.2 is 

A3 1 X3 - A3 - /3 1 

Theorem 4.2 still involves some quantities which can be difficult to compute, such as 
7r(y3) and tt{V), not to mention ^P"{V^) and The following Proposition gives 

some simple complementary bounds. 
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7r(y3) < 7r(J) ^ < ^ 



4.5 Proposition. Under Assumptions 2.1 and 4-1, 
(m) */ < i/ien ^^'"(y*) < 

1 — A2 



rr^K-X K-X 
n{V) < tt{J)- < 



1 - A 



1-A ' 



(iv) 
(v) 



1 - A2 

K 



1 - X 



then ^P'\V) < 



K 
1^' 



ll/llyi can he related to ||/||^^i by 



l/ll 1 < 11/11 1 



7r(J)(if5 - A5) 

{l-X^)mUexV"2{x) 



< 11/11 



y 2 



if 3 - A3 
1 - A3 



4-6 REMARK. In MCMC practice almost always the initial state is deterministically 
chosen, ^ — 6x for some x d X. In this case in (ii) and (Hi) we just have to choose x such 
that V^{x) < if 2/(1 — A3) and V{x) < K/{1 — A), respectively (note that the latter 
inequality implies the former). It might be interesting to note that our bounds would 
not be improved if we added a burn-in time t > at the beginning of simulation. The 
standard practice in MCMC computations is to discard the initial part of trajectory and 
use the estimator 



TJ ^ ^ 



Heuristic justification is that the closer ^P* is to the equilibrium distribution tt, the 
better. However, for technical reasons, our upper bounds on error are the tightest if the 
initial point has the smallest value of V , and not if its distribution is close to tt. 

4-7 REMARK. In many specific examples one can obtain (with some additional ef- 
fort) sharper inequalities than those in Proposition 4.5 or at least bound 7r(J) away 
from 1. However in general we assume that such bounds are not available. 



5. Polynomially ergodic Markov chains 

In this section we upper bound constants al^lP, f),CniP),Ci{P, f),C2{P, f), appear- 
ing in Theorem 3.1, for polynomially ergodic Markov chains under a quantitative drift 
assumption. Proofs are deferred to Sections 7 and 8. 

The following drift condition is a counterpart of Drift in Assumption 4.1, and is used 
to establish polynomial ergodicity of Markov chains [.JR(J2, DFMS04, DGM08, MT93]. 
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5.1 Assumption (Polynomial Drift). There exist a Junction V : X [l,cx)[, constants 
A < 1, a < 1 o,nd K < oo such that 



PV{x) < 



V{x) - (1 - X)V{x)°' for x^J, 
K for X & J, 



We note that Assumption 5.1 or closely related drift conditions have been established 
for MCMC samplers in specific models used in Bayesian inference, including independence 
samplers, random-walk Metropolis algorithms, Langevin algorithms and Gibbs samplers, 
see e.g. [FMOO, JT03, JR07]. 

In this Section we bound quantities appearing in Theorem 3.1 by expressions involving 
A, /3, a and K. The main result in this section is the following theorem. 

5.2 Theorem. // Assumptions 2.1 and 5.1 hold with a > | and / is such that 
||/||^3_, := sup, < oo, then 

1 if" -1-/311 

a(l — A) pa(l — A) P 2 

^ ' - (2a-l)(l-A)^^'' ^+a2(l-A)2^^^ ^ 

8ift-8-8^, 4-4/3 a(l-A) + 4 ^ A"2"-i - 1 - /3 



a2;3(l-A)2 a/3(l-A)y^' ' a/3(l - A) (2a - 1)/3(1 - A) 
4(A"- 1-^) ^2 /2At -2-2/3 ^ ly ^ f2K'i -2-2/3 ^ 1 



q2^(1-A)2 V a/3(l-A) ;3y V a/3(l - A) /3 



4a-2 



. C2(P,/)^ , 1 / A-A \^ 4(A-A) 

^'""^ - (2a-l)/3^(l-A) Ii-aJ +a2/3(l-Ar 



8At -8-8/3 4-4/3 A A- A a(l - A) + 4 A^""! - 1 - 



a2/3(l-A)2 a/3(l - A) ; V^(l - A) a/^(l - A) (2a - 1)/3(1 - A) 
4(A"-l-/3) , /2At-2-2;3 ^ l^^^ ^/^2At-2-2/3 ^ 1 



a2/3(l-A)2 V a/3(l-A) / V «/?(!- A) /3^ 
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5.3 REMARK. A counterpart of Theorem 5.2 parts {i — Hi) for ^ < a < | and 
functions s.t. < oo can be also established, using respectively modified but 
analogous calculations as in the proof of the above. For part [iv) however, an additional 
assumption 'k{V) < oo is necessary. 

Theorem 5.2 still involves some quantities depending on tt which can be difficult to 
compute, such as ■n{V'^) for rj < a. The following Proposition gives some simple comple- 
mentary bounds. 

5.4 Proposition. Under Assumptions 2.1 and 5.1, 
(i) For rj < a we have 

(a) If rj < a then can be related to \\f\\vi by 

ll/lk" < ll/lk" 



6. Applicability in Bayesian Inference and Examples 

To apply current results for computing MSE of estimates arising in Bayesian inference 
one needs drift and small set conditions with explicit constants. The quality of these 
constants will affect the tightness of the overall MSE bound. In this Section we present 
three numerical examples. In Subsection 6.1, a simplified hierarchical model similar as 
[.THOl, Example 2] is designed to compare the bounds with actual values and asses their 
quality. Next, in Subsection 6.2, we upperbound the MSE in the extensively discussed in 
literature Poisson-Gamma hierarchical model. Finally, in Subsection 6.3, we present the 
contracting normals toy-example to demonstrate numerical improvements over [LN I I]. 

In realistic statistical models the explicit drift conditions required for our analysis 
are very difficult to establish. Nevertheless, they have been recently obtained for a wide 
range of complex models of practical interest. Particular examples include: Gibbs sam- 
pling for hierarchical random effects models in [JH04]; van Dyk and Meng's algorithm for 
multivariate Student's t model [MH04]; Gibbs sampling for a family of Bayesian hierar- 
chical general linear models in [.J.T07] (c.f. also [J.JIU]); block Gibbs sampling for Bayesian 
random effects models with improper priors [TH09]; Data Augmentation algorithm for 
Bayesian multivariate regression models with Student's t regression errors [RlliO]. More- 
over, a large body of related work has been devoted to establishing a drift condition 
together with a small set to enable regenerative simulation for classes of statistical mod- 
els. This kind of results, pursued in a number of papers mainly by James P. Hobert, 
Galin L. Jones and their coauthors, cannot be used directly for our purposes, but may 
provide substantial help in establishing quantitative drift and regeneration required here. 
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In settings where existence of drift conditions can be established, but expUcit constants 
can not be computed (c.f. e.g. [FMRR()3, PROS]), our results do not apply and one 
must validate MCMC by asymptotic arguments. This is not surprising since qualitative 
existence results are not well suited for deriving quantitative finite sample conclusions. 

6.1. A Simplified Hierarchical Model 

The simulation experiments described below are designed to compare the bounds proved 
in this paper with actual errors of MCMC estimation. We use a simple example similar 
as [.ITIOl , Example 2]. Assume that y = (yi, . . . is an i.i.d. sample from the normal 
distribution N(/i, k^^), where k, denotes the reciprocal of the variance. Thus we have 

P{y\fj',i^) = • ■ • ,2/t|M,'«) « K*/2exp 

The pair (/x, k) plays the role of an unknown parameter. To make things simple, let us 
use the Jeffrey's non-informative (improper) prior k) — p(^)p{k) oc (in [.THOl] 
a different prior is considered). The posterior density is 

p{fi,K\y) (X p{y\fi,K)p{fi,K) oc K*/^"^exp 

where 

1 * 1 * _ 

Note that y and only determine the location and scale of the posterior. We will be using 
a Gibbs sampler, whose performance does not depend on scale and location, therefore 
without loss of generality we can assume that y = and = t. Since y = {yi, . . . ,yt) 
is kept fixed, let us slightly abuse notation by using symbols p{K\fi), p{fi\K) and p(/i) 
for p(K\ii,y), p{^\K,y) and p{^\y), respectively. The Gibbs sampler alternates between 
drawing samples from both conditionals. Start with some (/ig, kq). Then, for i = 1, 2, . . ., 

• - Gamma(t/2, (i/2)(s2 +^2^J), 

• ~N(0,1/(M)). 



^^{s' + {y-^^r) 
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If we are chiefly interested in fi then it is convenient to consider the two smaU steps 
Ki — > together. The transition density is 



P{tJ-i\tJ'2 



p{fJ.i\K)p{K\fl^_l)dK 



cx / K^^^ exp 



/ 2 , 2 \*/2 



Kt 



Kt 



«(*-i)/2exp 



Kt 



d/i 



dK 



cx (s2+^2_^)t/2 ^^2^^2_^^^2) 

The proportionality constants concealed behind the cx sign depend only on t. Finally we 
fix scale letting =t and get 

(6.1) p(^,|^,_,)oc 1 + ^] fl ■ ■ ^» 



2x-(t+l)/2 



t t 



If we consider the RHS of (6.1) as a function of only, we can regard the first factor as 
constant and write 



p(/Xj|^i_i) cx 1 + 1 + 



-{t+l)/2 



It is clear that the conditional distribution of random variable 



U2 



(6.2) /i, 1 



is t-Student distribution with t degrees of freedom. Therefore, since the t-distribution 
has the second moment equal to t/(t — 2) for t > 2, we infer that 



t-2 ' 



Similar computation shows that the posterior marginal density of /i satisfies 



p(m) cx 1 + 



t-l 
t t-l 



2 \ -t/2 



Thus the stationary distribution of our Gibbs sampler is rescaled t-Student with t — \ 
degrees of freedom. Consequently we have 



t 



t-2, 
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6.3 Proposition (Drift). Assume that t > 4. Let V{n) := fi^ + 1 and J = [—a, a]. 
The transition kernel of the (2-step) Gibbs sampler satisfies 

PVin) < i^^^^'^ ^""^ ^ provided that a > VW^)- 

[K for \n\ < a, 

The quantities A, K and ^{V) are given by 

A = ^ + 1 , K = 2+ — ^ and Tr(y) = 



t-2\l + a^ J ' t-2 ^ ' t-3 

I / 2t — 3 \ 1 

Proof. Since a > Jtlt - 3 we obtain that A = ( ^ + 1 I < 7i{t - 2) = 1. 

t — 2\\-\-a^ J t — 2 

Using the fact that 

py(M) = ntA + = iA = ^ + 1 

we obtain 

2f- 3 2 2t-3 „ 

M + TT-^ -2t + 2, 



t-2 Vl + a^ l + a 

= it-i)llla-) + 

= (i-2)(l + a2)^^ 
Hence XV{ii) — PV{fjb) > for > a. For /j. such that < a we get that 

pnM) = ^ + i<^ + i = 2+^±^^^ = 2+^. 

^^'^ t-2 t-2 t-2 t-2 

Finally 

, t 2t-S 

.(y)=E.^^ + l = — +1 = — . 



6.4 Proposition (Minorization) . Let Pmin be a subprobability density given by 

I \ _ /p(M|a) for < /i(a); 



□ 
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where p{-\-) is the transition density given by (6.1) and 



h{a) 



t/(t+i) 

1 + — 



1/2 



Then < a implies > Pmin(A*i)- Consequently, if we take for v the prob- 

ability measure with the normalized density Pmin/ P then the small set Assumption 2.1 
holds for J — [—a, a] . Constant (3 is given by 



/3= 1-P(|i9| < h{a))+V\ \^\<[1 + 



2\ -1/2 



Ha) , 



where d is a random variable with t-Student distribution with t degrees of freedom. 




y 

Figure 0. Illustration of Proposition 6.4, with t = 50 and a = 10. Solid lines are grapshs of 
p(/iiiO) and p{^i\a). Bold line is the graph of pmin(At!)- Gray dotted lines are graphs of 
p(/^i|/ii_i) for some selected positive < a 

Proof. The formula for Pmm results from minimisation of with respect to 

e [—a, a]. We use (6.1). First compute {d/dfii-i)p{p.i\fii-i) to check that for every 
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/ii the function i— > p{jii\^i-i) lias to attain minimum citlier at or at a. Indeed, 

= const. [^(.2 +Mtip-^(.^+MLi+Mn-<'+^^/' -2^^.-1 



d 

—p(fii\Hi-i) = const- -(s^ + f. 

Assuming tfiat fii-i > 0, the first factor at the right hand side of the above equation is 
positive, so {^/^^ii^l)pi^l^\^li-l) > iff t{s^ + + nj) - {t + l){s^ + ^f_i + > o, 
that is iff 

2 ^ -L 2 2 

Consequently, if tfif — s^ < then the function i— > p{fii\^j,i^i) is decreasing for fii-i > 
and mino<^i_i<Q j3(/Lti|/Xi_i) = J3(/Ui, a). If tpf — .s^ > then this function first increases 
and then decreases. In either case we have mino<^._j<a — miii[p{pi\a) , p{pi\0)]. 

Thus using symmetry, = — /^i-i), we obtain 

Pmin(M^) = , mm p{fl^\p^-l) = <^ -f f \ \ t \n\ 

|m.-i|<q [p(A*i|0) P[f^i\a) > p{fJ-i\0). 

Now it is enough to solve the inequality, say, p{p-\0) < p{^\a), with respect to fi. The 
following elementary computation shows that this inequality is fulfilled iff > h{a): 

(s2)t/2 ig2_^_^2y/2 
2 1 2 , 2\ (t+l)/2 / 2 , 2\ */2 

s"^ + a"^ + V / + \ 

' < ^ — , iff 



< 1 + ^ - 1, iff 



2 ^ 2 

> a 



c2 



1 



It is enough to recall that = t and thus the right hand side above is just h{a)^ 
To obtain the formula for /3, note that 



^= / Pmin(Ai)dAi = / p{fi\a)dfj,+ / p(/i|0)d^ 

J J|^|</i(a) J\n\>h(a} 

and use (6.2). □ 
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6.5 REMARK. It is interesting to compare the asymptotic behaviour of the constants 
in Propositions 6.3 and 6.4 for a — >■ oo. We can immediately see that — >■ l/{t — 2) and 
K"^ ^ /{t — 2). Shghtly more tedious computation reveals that h{a) ~ const • a-^/^*"*"^^ 
and consequently /3 ^ const • a^*/('+^''. 



The parameter of interest is the posterior mean (Bayes estimator of /i). Thus we let 
/(/z) = (1 and 6 — Eyr/i = 0. Note that our chain /zqj • ■ • j ^^^^ ... is a zero-mean martingale, 
so / = / and 



aUP,.f)=EAf) = 



t 



The MSE of the estimator On — X)"=o^ Z^" expressed analytically, namely 







1 - 


m 







t-2 
n^{t~3) 



1 - 



Mo- 



Obviously we have ||/|jyi/2 = 1. 

We now proceed to examine the bounds proved in Section 4 under the geometric drift 
condition. Assumption 4.1. Inequalities for the asymptotic variance play the crucial role 
in our approach. Let us fix i = 50. Figure 1 shows how our bounds on ai^^P, f) depend 
on the choice of the small set J = [—a, a]. 

The gray solid line gives the bound of Theorem 4.2 (ii) which assumes the knowledge 
of nV (and uses the obvious inequality 7r(T^^/^) < (nVy^'^). The black dashed line 
corresponds to a bound which involves only A, K and /3. It is obtained if values of t:V 
and ttV^^^ are replaced by their respective bounds given in Proposition 4.5 (i) and (ii). 
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Bound for a^Jj', f) in terms of K, X and |3 

— Bound using true value of Jt(V) 
True value of OasCP, f) = 1 .031 




I I I \ I I I 

2 4 6 8 10 12 

a 

Figure 1. Bounds for the root asymptotic variance as,s{P, f) as functions of a. 

The best values of the bounds, equal to 2.68 and 2.38, correspond to a = 3.91 and a = 
4.30, respectively. The actual value of the root asymptotic variance is a^siP, f) — 1-031. 
In Table 1 below we summarise the analogous bounds for three values of t. 



t 




Bound with known ttV 


Bound involving only A, K, P 


5 


1.581 


6.40 


11.89 


50 


1.031 


2.38 


2.68 


500 


1.003 


2.00 


2.08 



Table 1. Values of (Tas(P, /) vs. bounds of Theorem 4.2 (ii) combined with Proposition 4.5 (i) 

and (ii) for different values of t. 

The results obtained for different values of parameter t lead to qualitatively similar 
conclusions. From now on we keep t = 50 fixed. 

Table 2 is analogous to Table 1 but focuses on other constants introduced in Theorem 
3.1. Apart from aas{P,f ), we compare Co(P), Ci(P, /), C2(P, /) with the bounds given 
in Theorem 4.2 and Proposition 4.5. The "actual values" of Co{P),Ci{PJ),C2{PJ) 
are computed via a long Monte Carlo simulation (in which we identified regeneration 
epochs). The bound for Ci(P, /) in Theorem 4.2 (iii) depends on ^V^, which is typically 
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known, because usually simulation starts from a deterministic initial point, say xq (in our 
experiments we put xq = 0). As for C2{P, /), its actual value varies with n. However, in 
our experiments the dependence on n was negligible and has been ignored (the differences 
were within the accuracy of the reported computations, provided that n > 10). 



Constant 


Actual value 


Bound 
with known nV 


Bound 
involving only \, K, /3 


CoiP) 


0.568 


1.761 


2.025 


CiiPJ) 


0.125 




2.771 


C2{PJ) 


1.083 




3.752 



Table 2. Values of the constants appearing in Theorem 3.1 vs. bounds of Theorem 4.2 

combined with Proposition 4.5. 

Finally, let us compare the actual values of the root mean square error, RMSE := 
{§„ — 9Y, with the bounds given in Theorem 3.1. In column (a) we use the formula 
(3.2) with "true" values oi cj^{PJ) and Co(P), Ci(P, /), C2(P, /) given by (3.3)-(3.6). 
Column (6) is obtained by replacing those constants by their bounds given in Theorem 
4.2 and using the true value of t:V . Finally, the bounds involving only A, j3 are in 
column (c). 



n 


^ RMSE 


Bound (3.2) 


(a) 


(b) 




10 


0.98 


1.47 


4.87 


5.29 


50 


1.02 


1.21 


3.39 


3.71 


100 


1.03 


1.16 


3.08 


3.39 


1000 


1.03 


1.07 


2.60 


2.89 


5000 


1.03 


1.05 


2.48 


2.77 


10000 


1.03 


1.04 


2.45 


2.75 


50000 


1.03 


1.04 


2.41 


2.71 



Table 3. RMSE, its bound in Theorem 3.1 and further bounds based Theorem 4.2 combined 

with Proposition 4.5. 

Table 3 clearly shows that the inequalities in Theorem 3.1 are quite sharp. The bounds 
on RMSE in column (a) become almost exact for large n. However, the bounds on the 
constants in terms of minorization/drift parameters are far from being tight. While con- 
stants Co(-P), Ci(P, /), C2(P, /) have relatively small influence, the problem of bounding 
Cas(-F'j/) is of primary importance. 

This clearly identifies the bottleneck of the approach: the bounds on CTas(P, /) under 
drift condition in Theorem 4.2 and Proposition 4.5 can vary widely in their sharpness 
in specific examples. We conjecture that this may be the case in general for any bounds 
derived under drift conditions. Known bounds on the rate of convergence (e.g. in total 
variation norm) obtained under drift conditions are typically very conservative, too (e.g. 
[Bax05, RT99, .JH04]). However, at present, drift conditions remain the main and most 
universal tool for proving computable bounds for Markov chains on continuous spaces. 
An alternative might be working with conductance but to the best of our knowledge, 
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so far this approach has been apphed successfully only to examples with compact state 
spaces (see e.g. [Rud09, MN07] and references therein). 

6.2. A Poisson-Gamma Model 

Consider a hierarchical Bayesian model applied to a well-known pump failure data set 
and analysed in several papers (e.g.[GS90, Tie94, MTY95, Ros95a]). Data are available 
e.g. in [Da\ (JH], R package "SMPracticals" or in the cited Tierney's paper. They consist 
of m = 10 pairs ti) where yi is the number of failures for ith pump, during ti observed 
hours. The model assumes that: 

yi ^ Poiss(ti0i), conditionally independent for i = 1, . . . m, 
4>i ~ Gamma(a, r), conditionally i.i.d. for i = 1, . . . , to, 
r ~ Gamma(cr, 7). 
The posterior distribution of parameters </> = {4>i, . . . , (f)m) and r is 



-7r 



\i=l 



where a, cr, 7 are known hyperparameters. The Gibbs sampler updates cyclically r and 
(p using the following conditional distributions: 

r\(l),y ^ Gamma ^ma + g, 7 + ( 

4>t\4>-t,r,y ^ Gamma (yi + a, ti + r) . 

In what follows, the numeric results correspond to the same hyperparameter values as 
in the above cited papers: a — 1.802, tr — 0.01 and 7=1. For these values, Rosenthal in 
[Ros93a] constructed a small set J = {(0, r) : 4 < ^ 0i < 9} which satisfies the one-step 
minorization condition (our Assumption 2.1) and established a geometric drift condition 
towards J (our Assumption 4.1) with V{(l), r) — 1 + (^ (pi — 6.5)^. The minorization and 
drift constants were the following: 

(3 = 0.14, A = 0.46, K = 3.3. 

Suppose we are to estimate the posterior expectation of a component cpi . To get a bound 
on the (root-) MSB of the MGMG estimate, we combine Theorem 3.1 with Proposition 4.2 
and Proposition 4.5. Suppose we start simulations at a point with '^(pi — 6.5 i.e. with 
initial value of V equal to 1. To get a better bound on via Proposition 4.5 (v), 

we first reduce ||/|| 1 by a vertical shift, namely we put f{(f),r) = <f>i — h for b = 
3.327 (expectation of (pi can be immediately recovered from that of (pi — b). Elementary 
and easy calculations show that ||/||^i < 3.327. We also use the bound taken from 

Proposition 4.5 (ii) for ^{V) and the inequality i:{V^) < Tr{V)^ . Finally we obtain the 
following values of the constants: 

aas(P,/) = 171.6 and Co(P) = 27.5, Ci(P, /) = 547.7, C2(P, /) = 676.1. 
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6.3. Contracting Normals 

As discussed in the Introduction the resuhs of the present paper improve over earher 
MSE bounds of [LNll] for geonietricaUy ergodic chains in that they are much more gen- 
erahy apphcable and also tighter. To illustrate the improvement in tightness we analyze 
the MSE and confidence estimation for the contracting normals toy-example considered 
in [LM.1]. 

For the Markov chain transition kcncl 

P{x,-) ^ N{cx,l~ c^), with |c| < 1, on A" = M, 

with stationary distribution A'^(0, 1), consider estimating the mean, i.e. put f{x) = x. 
Similarly as in [LX ' '] we take a drift function V{x) = 1 + resulting in ll/llvi/2 = 1. 
With the small set J — [— d, d] with d > 1, the drift and regeneration parameters can be 
identified as 

where $ stands for the standard normal cdf. We refer to [LNll, BaxO'i] for details on 
these elementary calculations. 

To compare with the results of [LNll] we aim at confidence estimation of the mean. 
First, we combine Theorem 3.1 with Proposition 4.2 and Proposition 4.5 to upperbound 
the MSE of On and next we use the Chebyshev inequality. We derive the resulting minimal 
simulation length n guaranteeing 

f{\6n-0\<€) > l-a, with e = a = 0.1. 

This is equivalent to finding minimal n s.t. 

MSE(^„) < e^a. 

Note that for small values of a a median trick can be applied resulting in an exponentially 
tight bounds, see [NP09, LNll, LMNll] for details. The value of c is set to 0.5 and the 
small set half width d has been optimised numerically for each method yielding d — 1.6226 
for the bounds from [LNll] and d = 1.7875 for the results based on our Section 4. The 
chain is initiated at 0, i.e. = Sq. Since in this setting the exact distribution of 0„ can 
be computed analytically, both bounds are compared to reality, which is the exact true 
simulation effort required for the above confidence estimation. 

As illustrated by Table 4, we obtain an improvement of 5 orders of magnitude com- 
pared to [LNll] and remain less then 2 orders of magnitude off the truth. 



Bound involving only X, K, /3 


Bound with known nV 


Bound from [LNll] 


Reality 


77,285 


43,783 


6,460,000,000 


811 



Table 4. Comparison of the total simulation effort n required for nonasymptotic confidence 
estimation f{\6„ — 6\ < e) > 1 — a with e = a = 0.1 and the target function f{x) = x. 
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7. Preliminary Lemmas 

Before we proceed to the proofs for Sections 4 and 5, we need some auxiliary results that 
might be of independent interest. 

We work under Assumptions 2.1 (small set) and 5.1 (the drift condition). Note that 
4.1 is the special case of 5.1, with a — 1. Assumption (4.1) implies 



(7.1) PV^{x) < 



A 2 1/2 (2;) for X 1^ J, 
for a; G J, 



because by Jensen's inequality 2 (x) < PV{x). Whereas for a < 1, Lemma 3.5 of 
[JR02] for all ?7 < 1 yields 



(7.2) PV^ix) < 



V^i{x) -T]{1 - X)V{xyi+''-^ forx^J, 
K'^ for X e J. 



The following lemma is a well-known fact which appears e.g. in [Num02] (for bounded 
g). The proof for nonnegative function g is the same. 

7.3 Lemma. If g > then 

E,S(g)2 = E,T (E^giXof + 2f^E^g{Xo)g{Xn)I{T > 71) j . 

We shall also use the generalised Kac Lemma, in the following form that follows as an 
easy corollary from Theorem 10.0.1 of [MT93] 

7.4 Lemma. //7r(|/|) < 00, then 

„ r(J) 

Trif) = / Ex 51 fiXMdx), where 
■'J i=i 

(7.5) t(J) min{n > : X,, e J}- 

The following Lemma is related to other calculations in the drift conditions setting, 
e.g. [Bax05, LT96, DMR04, Ros02, For03, DGM08]. 

7.6 Lemma. If Assumptions 2.1 and 5.1 hold, then for all rj < 1 



Ml - A) ' /3 

y (x) K'^ - 1 - 13 1 

< — — + — + -5 - 1 (if additionally a + 'q>l). 

77(1 -A) /3?7(1-A) P 
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7.7 Corollary. For Y.ZZl V""+''"i(X„) we need to add the term V°'+'^-'^{x). Hence 



T-l 



V^ix) - 1 + 7](1 - A) - 77(1 - \)V+'^-^{x) 



?7(1- A) 

n=0 ' 



7^'' — 1 1 



Nil - A) ' /3 
T/''(2;) K'^-l-fi 1 
^(1-A) + /37?(1-A) 

In the case of geometric drift, the second inequaUty in Lemma 7.6 can be replaced 
by a slightly better bound. For a — 7] — 1, the first inequality in Lemma 7.6 entails the 
following. 

7.8 Corollary. If Assumptions 2.1 and 4-1 hold then 

n—l ^ ' 

Proof of Lemma 7.6. The proof is given for 77 = 1, because for 77 < 1 it is identical 
and the constants can be obtained from (7.2). 

Let S := Sq := min{7T, > : X„ G J} and Sj :— min{7i > Sj^i : X„ e J} for 
j = 1,2, . . .. Moreover set 



H{x) E,^F"(X„), 

H := supE^y^l^^C^n) To = ) = sup [ Q{x,dy)Hiy). 



Note that H{x) = l/"(x) for x e J and recall that Q denotes the normalized "residual 
kernel" defined in Section 2. 
We will first show that 

(7.9) H{x) < ^^""^ ~ ^ for X ex. 

1 — A 

Let J-n — o'{Xq^ . . . , Xn) and remembering that i] — 1^ rewrite (7.2) as 

(7.10) viXnTUXn^J) < ^[F(x„)-E(y(x„+i)|j-„)]i(x„^ J). 
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Fix X ^ J . Since {X„ ^ J} 15 {S* > n} G Fn we can apply (7.10) and write 

(S-l)Am m 



so 



ri=0 



ri=0 



^ m - 

= Y^I^ E,l^(X„)I(5>n)-E,E(1/(X„+i)I(5>n)|J-„) 

n^O - 

= ^— E,l/(X„)I(5>n)-E,F(X„+i)I(5>n+l) 

n=0 - 

- E,F(X„+i)I(5 = n + l) 



< 



1 - A 



V{x) - E,V{X^+i)liS >m + l)-J2 KxV{Xn+i)l{S = n + 1) 



n=0 



V{x)-E,ViXs^(^rn+l)) 

1-A 



SA(m+l) (S-l)Am 

E, ^ V^"(X„) = E, ^ y"(X„)+E,F"(Xs^(„+i)) 

y(a;)~E,F(Xg^(^ +i)) 
1 - A 



< 



V{x)-XE,ViXsAi^+i)) ^ V{x)-X 



1 - A 



1 - A 



Letting to — > oo yields equation (7.9) for x ^ J. For x £ J, (7.9) is obvious. 

Next, from Assumption 5.1 we obtain PV{x) = (1 — f3)QV{x) + fivV < K for x G J, 
so QV{x) < [K — /3)/(l — /3) and, taking into account (7.9), 



(7.11) 



H < 



{K-/3)/{l-l3)-X ^ K-X~p{l- A) 
1-A (i_A)(l-/3) ■ 
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Recall that T := min{rt > 1 : r„_i = 1}. For x £ J we thus have 

T-l oo Sj 



27 



j = l n^Sj-i + l 



n=l 



rs,_, = 0) 



by (7.11). For x ^ J we have to add one more term and note that the above calculation 
also applies. 

T-l So oo Sj 

71—1 n—1 j — 1 n—Sj-i+1 

The extra term is equal to H{x) — V"{x) and we use (7.9) to bound it. Finally we obtain 



T-l 



(7.12) E,^y"(X„) < 



Vix)-X-il-X)V^^^^^j^^J<^_^^ 



1 - A 



/3(1-A) 



□ 



7.13 Lemma. If Assumptions 2.1 and 5.1 hold, then 
(i) for all r] < a 



(ii) 



(Hi) for all n > and rj < a 



7r(T/") < 
7r(J) > 

E.r"(X„) < 



1 - A 
1 - A 



1 fK-\V" 



Proof. It is enough to prove (i) and (iii) for r; = a and apply the Jensen inequality for 
rj < a. We shall need an upper bound on ExJ2n=i ^"(^n) for x € J, where r(J) is 
defined in (7.5). From the proof of Lemma 7.6 



r(J) 



K-X 



E,^1/"(X„) = PH{x) < ^— xeJ. 
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And by Lemma 7.4 we obtain 



1 < ttV° 



•'•^ n=l 



which imphes (i) and (ii). 

By integrating the smaU set Assumption 2.1 with respect to tt and from (ii) of the 
current lemma, we obtain 

dv 1 K-\ 

< ^ < 



dn - ^7r(J) - ,3(1 -A)' 

Consequently 

E,F"(X„) = / P"y"(x)^7r(dx) < I P"F"(a;)^(dx) 

Jx 



'dn ' ' - p{l-X)Jx 



(3(1 - A) 

and (iii) results from (i). □ 

8. Proofs for Section 4 and 5 

In the proofs for Section 4 we work under Assumption 4.1 and repeatedly use Corol- 
lary 7.8. 

Proof of Theorem 4.2. (i) RecaU that Cq{P) = E„T - i, write 

T-l 

E,T < 1+E,^T/(X„) 

ri=l 

and use Corollary 7.8. The proof of the alternative statement (i') uses first (7.1) and then 
is the same. 

(ii) Without loss of generality assume that ||/||^i ~ 1- By Lemma 7.3 we then have 
aliPJ) = E.(S(/))VE.T<E,(S(y5))VE.r 

T-l 

= E,y(Xo) + 2E,^T/5(Xo)V^5(X„) I + XL 

n=l 

To bound the second term we will use Corollary 7.8 with vi in place of V, which is 
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legitimate because of (7.1). 

T-l T-1 

11/2 = E,^y5(Xo)y5(x„) = E,l/5(Xo)E(^ V^5(x„)|Xo) 

n— 1 n— 1 

1 / A5 1 - A5 - /3\ 

.1 .1 



A3 Ki - \i ~ B 1 

TAV) + ——^-T^niV-^). 



I-A2 /3(1-A3) 
Rearranging terms in I + II, we obtain 



1 — A2 — A2 ) 

and the proof of (ii) is complete. 

(iii) The proof is similar to that of (ii) but more delicate, because we now cannot use 
Lemma 7.3. First write 

E,(S(y3))2 = eJy,vHx^)\ = Ejf2vHx^)I{n<T)\ 

Vn^O / Vn^O / 

00 00 00 

= E, ^ y(X„)I(n < T) + 2E, ^ vHXn)vHx,)l{j <T) 

n—0 n—Oj—n-\-l 

=: I + 11. 

The first term can be bounded dirctly using Corollary 7.8 applied to V. 

K -\- (3 



I = E, ^ y(X„)I(n < T) < 



... ;9(1-A) ■ 

To bound the second term, first condition on X„ and apply Corollary 7.8 to , then 
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again apply this corollary to V and to F 2 . 



11/2 = E,^y5(X„)I(n<r)E I v"HXj)l{j<T) 



n=0 



< Vl/5(X„)I(n<r) vHXn) + 

V 1 — A2 



if! - A3 -/3 



ri=0 



< 



A3 



n=0 
1 



A3 ^ /3(1-A3) 
if3 - A3 - 

n=0 



-4^E. ^ y (X„)I(n < T) + ^^E. ^ F3 (X„)I(n < T) 







-V{x) 



K - X- 13 



1-Xk \l-X' ' /3(1-A) 
_ A3 -/3 / 



/3(1-A3) \^1-A3 
Finally, rearranging terms in I + II, we obtain 



yV-^ (x) H 

/3(1-A3) ^ 



E,(S(y3))2 < 



1 , 2{K2 - X-2 1, , 



(1- A3)2 " ■ A3)2 

;3(i4: - A - /3) + 2(if3 - A3 - pf 



/32(1-A3)2 



which is tantamount to the desired result, 
(iv) The proof of (iii) applies the same way. 



□ 



Proof of Proposition 4-5. For (i) and (ii) Assumption 4.1 or respectively drift con- 
dition (7.1) implies that irV = nPV < X{ttV — 7r(J)) + Ktt{J) and the result follows 
immediately. 

(iii) and (iv) by induction: ^P'^+^V = ^P"(PF) < ^P'^{XV + K) < XK/{1-X) + K = 
K/{l-X). 

(v) We compute: 



- sup — < sup — — 

xex V{x) V{x) 



< sup ( 



1 



-kV 



< 11/11 



1 



< ll/lk + sup^^^ 

<J){K-X) 
(1- A) inUexV{x) 



□ 



In the proofs for Section 5 we work under Assumption 5.1 and repeatedly use Lemma 
7.6 or Corollary 7.7. 



imsart-bj ver. 2007/12/10 file: LaMiaNie_exaiiip_ll_final.tex date: March 27, 2012 



Nonasymptotic estimation error of MCMC 31 
Proof of Theorem 5.2. (i) Recall that Co(F) = E„T - \ and write 

T-l „ T-1 

From Lemma 7.G with V , a and 77 — a we have 

^ , . 1 f (V°'(x) - 1 if" - 1 1 \ , , , 

a(l-A) ^ ^ /3a(l-A) /3 2 
(ii) Without loss of generality we can assume that ||/|| 3„_i = 1. By Lemma 7.3 we have 

aL(P,/) = E,(S(/))VE.r < E.(S(yi"-i))VE.T 

T-l 

= E^F(Xo)3"-2 + 2E, ^y3°-i(Xo)y5"-i(X„) =: I + IL 

n=l 

To bound the second term we will use Lemma 7.6 with V , a and rj = ^. 

T-l T-l 

II/2 = E,^Fi"-i(Xo)Fi"-i(X„) = E,l-i"-i(Xo)E(^yi"-i(X„)|Xo) 

n— 1 n— 1 

3„ 1 . ^ /Ft(Xo) - 1 ift _ 1 1 \ 



a(l-A) ' ' V a/3(l-A) /? 

The proof of (ii) is complete. 

(iii) The proof is similar to that of (ii) but more delicate, because we now cannot use 
Lemma 7.3. Write 

E,(S(yt"-i))2 = E, (^l/t"-i(X„)^ = E, (f;yi"-i(X„)I(n<T)^ 



\n=0 



00 00 



+ 2E,^ J2 ^5"-i(x„)yi"-i(x,)i(j <r) 

n— J— n+1 

=: I + 11. 

The first term can be bounded dirctly using Corollary 7.7 with = 2a — 1 

°° j/2a-l(^\ R-Sq-I 1 _ « 1 
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To bound the second term, first condition on X„ and use Corollary 7.7 with f] — ^ 
then again use Corollary 7.7 with rj — a and t] ~ 



11/2 = E,^yi"-i(X„)I(n<r)E I V^''-\X,)Iij <T) 



n=0 



< E,^yi"-i(X„)I(n<r) 



n=0 



/ 2Ft(X„) 2j^t - 2 - 2/3 1 ^ , 
^a(l-A) al3{l-X) /3 



A) 



^ F(X„)2"-il(n < T) 



ri=0 



< 



a{l - A) Va(l - A) 
/2K^~2~2I3 1 
- A) ^ /3 

So after gathering the terms 

1 



/2ii't -2-2/3 1 
V a/3(l-A) +^ 



1 iiT" - 1 - /3 1 



l)E,5^F*"-i(X„)I(n<r) 



- 1 



E,(S(F5"-^))^ < 



(8.1) 



(2a- 1)(1 - A) 



2yt(2;) 2ift -2-2/3 1 
a(l- A) ^ a/3(l-A) ^ /3, 

a(l - A) +4 



a2(i- A): 



rF"(:r) + 



/Sift -8-8/3 4-4^ 



V a2/3(l-A)2 a/3(l-A) 



FT(a;) + 



al3{l - A) 
-1-/3 



4:{K°' -1-/3) ^ 2 1^ 2is:t - 2 - 2/3 ^ 1 



a2/3(l-A)2 



a/3(l-A) /3 



(2a-l)/3(l-A) 

^ '2i^t -2-2/3 1 
a/3(l - A) ^ /3 



- 2 



(iv) Recall that C2(P, /)2 = E^ Ei=,i"' \fiX^)\^T < n) and we have 



Tr(„)-1 



Ec| 5] |/(X,)|I(T<. 

i—n 



(8.2) 



T = j\ P5(r = j) 



< X^eJ 5] |/(X,)| ^dT = j) 



/T-1 



5]E.P„-J ^|/(X,)| Pc(r-j). 
i=i \ 1=0 / 
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Since 

/T-l \2 



33 



, i=0 



.4=0 



and I/I < 1^2" 1 we put (8.1) into (8.2) and apply Lemma 7.13 to complete the proof. □ 
Proof of Proposition 5.4- For (i) see Lemma 7.13. For (ii) we compute: 



l/(^)-^/l . 1/(^)1 + k./| 

sup — , , — < sup , , 



< sup 11/11^7 



1 



< \\f\\v;il + 7r(V^)). 



□ 
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